Spoken Term Detection Using Spoken Document Index Based on Keywords Collected from Automatic Speech Recognition Result

نویسندگان

  • Kentaro Domoto
  • Takehito Utsuro
  • Hiromitsu Nishizaki
چکیده

This paper presents a novel spoken document indexing framework for Spoken Term Detection (STD). Our proposed method utilizes an STD method for making an index from keywords collected from outputs from automatic speech recognition systems. The STD method is conducted for all the keywords as query terms; then, the detection result, a set of each keyword and its detection intervals in the spoken document, is obtained. For the keywords that have competitive intervals, we rank them based on the matching cost of STD and select the best one with the longest duration among competitive detections. This is the final output of STD process and serves as an index word for the spoken document. The proposed framework was evaluated on real lecture speeches as spoken documents in an STD task. The results show that our framework was quite effective for preventing false detection errors and in annotating keyword indices to spoken documents. 

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Spoken Term Detection for Persian News of Islamic Republic of Iran Broadcasting

Islamic Republic of Iran Broadcasting (IRIB) as one of the biggest broadcasting organizations, produces thousands of hours of media content daily. Accordingly, the IRIBchr('39')s archive is one of the richest archives in Iran containing a huge amount of multimedia data. Monitoring this massive volume of data, and brows and retrieval of this archive is one of the key issues for this broadcasting...

متن کامل

Spoken Term Detection Using SVM-Based Classifier Trained with Pre-Indexed Keywords

This study presents a two-stage spoken term detection (STD) method that uses the same STD engine twice and a support vector machine (SVM)-based classifier to verify detected terms from the STD engine’s output. In a front-end process, the STD engine is used to preindex target spoken documents from a keyword list built from an automatic speech recognition result. The STD result includes a set of ...

متن کامل

Two-step spoken term detection using SVM classifier trained with pre-indexed keywords based on ASR result

This paper presents a novel two-step spoken term detection (STD) method that uses the same STD engine twice and a support vector machine (SVM)-based classifier to verify detected terms from the output of the second STD engine. In the first STD process, pre-indexing of the target spoken documents from a keyword list built from the results of automatic speech recognition of the speeches is perfor...

متن کامل

Combining State-level and DNN-based Acoustic Matches for Efficient Spoken Term Detection in NTCIR-12 SpokenQuery&Doc-2 Task

Recently, in spoken document retrieval task such as spoken term detection (STD), there has been increasing interest in using a spoken query. In STD systems, automatic speech recognition (ASR) frontend is often employed for its reasonable accuracy and efficiency. However, out-of-vocabulary (OOV) problem at ASR stage has a great impact on the STD performance for spoken query. In this paper, we pr...

متن کامل

Spoken Term Detection Using Phoneme Transition Network from Multiple Speech Recognizers' Outputs

Spoken Term Detection (STD) that considers the out-of-vocabulary (OOV) problem has generated significant interest in the field of spoken document processing. This study describes STD with false detection control using phoneme transition networks (PTNs) derived from the outputs of multiple speech recognizers. PTNs are similar to subword-based confusion networks (CNs), which are originally derive...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016